Bound databricks-cli login timeout to fix indefinite WSL hang#1921
Conversation
*Why* With auth_type=databricks-cli, a stale/expired cached token makes the extension run `databricks auth login`, whose browser OAuth challenge defaults to a 1-hour timeout in the bundled CLI. In WSL the Linux CLI cannot open the Windows browser or receive the localhost callback, so the flow never completes and the extension appears to hang indefinitely on "Attempting to configure auth: databricks-cli" (#1917). The user is left with no feedback and no guidance. *What* - Pass `--timeout 300s` to `auth login` so a non-completing browser flow fails fast (5 min is ample for a human login, far below the 1h default). - On login failure, surface an actionable error telling the user to run `databricks auth login --profile <p>` (or `--host <h>`) in a terminal and reload, instead of only the raw CLI error. - Make the exec function injectable on DatabricksCliCheck so login argument-building and error handling are unit-testable without spawning the real CLI. This is the pragmatic fix for the reported symptom (the hang). The underlying token-cache host/profile split-brain is a separate CLI-side issue tracked upstream; it is not required to stop the hang. *Verification* - New DatabricksCliCheck.test.ts covers: bounded --timeout passed with a profile, --host used without a profile, and the actionable failure message. - tsc --noEmit clean; eslint and prettier clean on the changed files. - (The repo's VS Code mocha harness cannot run under this env's Node 26 due to an unrelated transitive dep; behavior was verified against the compiled output with the SDK/vscode modules stubbed.)
|
If integration tests don't run automatically, an authorized user can run them manually by following the instructions below: Trigger: Inputs:
Checks will be approved automatically on success. |
|
[EMU] Integration tests: https://github.com/databricks-eng/eng-dev-ecosystem/actions/runs/27832465346 |
| args.push("--host", host); | ||
| } | ||
| await execFile( | ||
| await this.execFileFn( |
There was a problem hiding this comment.
Can this be fixed by setting BROWSER env var for WSL environments? (CLI side fix, would be better)
More details:
https://medium.com/@pcbowers/wsl-windows-10-allow-web-links-to-open-automatically-27bdc53d6f86
https://superuser.com/questions/1262977/open-browser-in-host-system-from-windows-subsystem-for-linux
misha-db
left a comment
There was a problem hiding this comment.
Approach reduces "pain", but doesn't fixe it. We can merge this but should investigate complete fix for WSL. (separate ticket?)
|
@misha-db, I investigated the proper fix, and it seems we might not want to deliver it. It would add fairly complex logic to the CLI without bringing enough benefit. Since the affected user population is small and there is a clear workaround, I decided to limit the changes to VS Code: instructions + timeout on login. |
Why
Fixes #1917.
With
auth_type=databricks-cli, a stale/expired cached token makes the extension rundatabricks auth login, whose browser OAuth challenge defaults to a 1-hour timeout in the bundled CLI. In WSL the Linux CLI cannot open the Windows browser or receive the localhost callback, so the flow never completes and the extension appears to hang indefinitely on "Attempting to configure auth: databricks-cli" — with no feedback and no guidance.This PR fixes the hang (the reported symptom). The underlying token-cache host/profile split-brain that triggers the re-login is separate CLI-side state; the broader direction there is profile-first token caching handled upstream in the Databricks CLI, and a one-time
databricks auth login --profile <name>resolves it permanently in the meantime. See the issue thread for the full root-cause walkthrough.What
--timeout 300stoauth loginso a non-completing browser flow fails fast (5 min is ample for a human login, far below the 1h default) instead of stalling.databricks auth login --profile <p>(or--host <h>) in a terminal and reload, instead of only the raw CLI error.DatabricksCliCheckso login argument-building and error handling are unit-testable without spawning the real CLI.Verification
DatabricksCliCheck.test.tscovers: bounded--timeoutpassed with a profile,--hostused without a profile, and the actionable failure message.tsc --noEmitclean; eslint and prettier clean on the changed files.